pacman::p_load(tidytext, readtext, quanteda, tidyverse, jsonlite, igraph, tidygraph, ggraph, visNetwork, clock, graphlayouts,plotly,ggiraph)Main solution
Getting Started
Installing and loading the required libraries
Note: Ensure that the pacman package has already been installed.
The following R packages will be used:
tidytext
tidyverse
readtext
quanteda
jsonlite
igraph
tidygraph
ggraph
visNetwork
graphlayouts
plotly
ggiraph
Importing JSON File
Direct import of the mc3.json file shows an error message indicating that there’s an invalid character in the JSON text, specifically “NaN”. As “NaN” is not recognised as a valid value, preprocessing of the JSON file to replace “NaN” is required.
In the code chunk below, mc3.json is first imported, then all instances of “NaN” are replaced with “null”, and the processed file is written into a json file mc3_fixed.json for later use.
# Read the JSON file as text
json_text <- readLines("data/mc3.json")Warning in readLines("data/mc3.json"): incomplete final line found on
'data/mc3.json'
# Replace "NaN" with "null"
json_text_fixed <- gsub("NaN", "null", json_text)
# Write the fixed JSON text back to a file
writeLines(json_text_fixed, "data/mc3_fixed.json")Importing preprocessed mc3_fixed.json file
mc3_data <- fromJSON("data/mc3_fixed.json")Check dataframe
Opens new tabs within R workspace, not shown in website
Example of the view is shown in the screenshot tab below
view(mc3_data[["nodes"]])
view(mc3_data[["links"]])mc3_data[[“nodes’]


mc3_data[[“links”]]


View dataframe
- Similar info as shown above
glimpse(mc3_data)List of 5
$ directed : logi TRUE
$ multigraph: logi TRUE
$ graph : Named list()
$ nodes :'data.frame': 60520 obs. of 15 variables:
..$ type : chr [1:60520] "Entity.Organization.Company" "Entity.Organization.Company" "Entity.Organization.Company" "Entity.Organization.Company" ...
..$ country : chr [1:60520] "Uziland" "Mawalara" "Uzifrica" "Islavaragon" ...
..$ ProductServices : chr [1:60520] "Unknown" "Furniture and home accessories" "Food products" "Unknown" ...
..$ PointOfContact : chr [1:60520] "Rebecca Lewis" "Michael Lopez" "Steven Robertson" "Anthony Wyatt" ...
..$ HeadOfOrg : chr [1:60520] "Émilie-Susan Benoit" "Honoré Lemoine" "Jules Labbé" "Dr. Víctor Hurtado" ...
..$ founding_date : chr [1:60520] "1954-04-24T00:00:00" "2009-06-12T00:00:00" "2029-12-15T00:00:00" "1972-02-16T00:00:00" ...
..$ revenue : num [1:60520] 5995 71767 0 0 4747 ...
..$ TradeDescription : chr [1:60520] "Unknown" "Abbott-Gomez is a leading manufacturer and supplier of high-quality furniture and home accessories, catering to"| __truncated__ "Abbott-Harrison is a leading manufacturer of high-quality food products, including baked goods, snacks, and bev"| __truncated__ "Unknown" ...
..$ _last_edited_by : chr [1:60520] "Pelagia Alethea Mordoch" "Pelagia Alethea Mordoch" "Pelagia Alethea Mordoch" "Pelagia Alethea Mordoch" ...
..$ _last_edited_date: chr [1:60520] "2035-01-01T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" ...
..$ _date_added : chr [1:60520] "2035-01-01T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" ...
..$ _raw_source : chr [1:60520] "Existing Corporate Structure Data" "Existing Corporate Structure Data" "Existing Corporate Structure Data" "Existing Corporate Structure Data" ...
..$ _algorithm : chr [1:60520] "Automatic Import" "Automatic Import" "Automatic Import" "Automatic Import" ...
..$ id : chr [1:60520] "Abbott, Mcbride and Edwards" "Abbott-Gomez" "Abbott-Harrison" "Abbott-Ibarra" ...
..$ dob : chr [1:60520] NA NA NA NA ...
$ links :'data.frame': 75817 obs. of 11 variables:
..$ start_date : chr [1:75817] "2016-10-29T00:00:00" "2035-06-03T00:00:00" "2028-11-20T00:00:00" "2024-09-04T00:00:00" ...
..$ type : chr [1:75817] "Event.Owns.Shareholdership" "Event.Owns.Shareholdership" "Event.Owns.Shareholdership" "Event.Owns.Shareholdership" ...
..$ _last_edited_by : chr [1:75817] "Pelagia Alethea Mordoch" "Niklaus Oberon" "Pelagia Alethea Mordoch" "Pelagia Alethea Mordoch" ...
..$ _last_edited_date: chr [1:75817] "2035-01-01T00:00:00" "2035-07-15T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" ...
..$ _date_added : chr [1:75817] "2035-01-01T00:00:00" "2035-07-15T00:00:00" "2035-01-01T00:00:00" "2035-01-01T00:00:00" ...
..$ _raw_source : chr [1:75817] "Existing Corporate Structure Data" "Oceanus Corporations Monthly - Jun '35" "Existing Corporate Structure Data" "Existing Corporate Structure Data" ...
..$ _algorithm : chr [1:75817] "Automatic Import" "Manual Entry" "Automatic Import" "Automatic Import" ...
..$ source : chr [1:75817] "Avery Inc" "Berger-Hayes" "Bowers Group" "Bowman-Howe" ...
..$ target : chr [1:75817] "Allen, Nichols and Thompson" "Jensen, Morris and Downs" "Barnett Inc" "Bennett Ltd" ...
..$ key : int [1:75817] 0 0 0 0 0 0 0 0 0 0 ...
..$ end_date : chr [1:75817] NA NA NA NA ...
mc3_date[[“nodes”]] dataframe contains 15 columns and 60520 rows.
mc3_date[[“links”]] dataframe contains 11 columns and 75817 rows.
On closer inspection of mc3_data, we note some issues to be rectified:
- Columns containing dates are treated as “Character” data type instead of date data type, which is incorrect. Thus, the data type of the following fields need to be changed to “Date”” data type:
- founding_date
- _last_edited_date
- _date_added
- start_date
- _last_edited_date
- _date_added
- dob
- Some columns have missing values, which need to be handled appropriately for ease of later analysis.
- Some columns are prefixed with “_”, we remove them to reduce chance of bugs later
Missing Values
Identify the percentage of missing values within the dataset
# Function to calculate missing value percentages
calculate_missing_percentage <- function(df) {
total_values <- nrow(df) * ncol(df)
missing_values <- sum(is.na(df))
missing_percentage <- (missing_values / total_values) * 100
return(missing_percentage)
}nodes_missing_percentage <- calculate_missing_percentage(mc3_data[["nodes"]])
nodes_missing_percentage[1] 35.11952
nodes_missing_by_column <- sapply(mc3_data[["nodes"]], function(x) sum(is.na(x)) / length(x) * 100)
nodes_missing_by_column type country ProductServices PointOfContact
0.00000 0.00000 85.34204 85.38334
HeadOfOrg founding_date revenue TradeDescription
85.35691 85.34204 85.36847 85.34204
_last_edited_by _last_edited_date _date_added _raw_source
0.00000 0.00000 0.00000 0.00000
_algorithm id dob
0.00000 0.00000 14.65796
links_missing_percentage <- calculate_missing_percentage(mc3_data[["links"]])
links_missing_percentage[1] 9.059973
links_missing_by_column <- sapply(mc3_data[["links"]], function(x) sum(is.na(x)) / length(x) * 100)
links_missing_by_column start_date type _last_edited_by _last_edited_date
0.1187069 0.0000000 0.0000000 0.0000000
_date_added _raw_source _algorithm source
0.0000000 0.0000000 0.0000000 0.0000000
target key end_date
0.0000000 0.0000000 99.5410000
Nodes: Overall, there are 35.12% missing values. While most columns have no missing values, the majority of those with missing data pertain to optional attributes:
ProductServices (Optional) - 85.34%
PointOfContact (Optional)- 85.38%
HeadofOrg (Optional) - 85.36%
founding_date - 85.34%
revenue (Optional) - 85.37%
TradeDescription (Optional) - 85.34%
dob - 14.66%
Links: Overall, there are 9.06% missing values. Most of the columns do not contain missing values, except for:
start_date - 0.12%
end_date (Optional) - 99.54%
In addition, according to the VAST2024 - MC3 Data Description file, all empty values in the revenue column are supposed to have been set to 0. However, there are still some values with “NA”.
Setting empty values in revenue to 0
Set NA values to 0 to aid analysis
# Create a copy of mc3_data
mc3_data2 <- mc3_data
# Set empty values in revenue to 0 and save it to the new list
mc3_data2$nodes$revenue <- ifelse(is.na(mc3_data2$nodes$revenue) | mc3_data2$nodes$revenue == "", 0, mc3_data2$nodes$revenue)Verify changes
# ensure no more missing values in revenue column
sum(is.na(mc3_data2$nodes$revenue))[1] 0
Rename Columns
Remove prefix “_” from columns to reduce chance of issues later
# Function to remove leading underscores from column names
remove_leading_underscores <- function(df) {
colnames(df) <- gsub("^_", "", colnames(df))
return(df)
}
# Create a copy of mc3_data2 and name it mc3_data3
mc3_data3 <- mc3_data2
# Apply the function to the nodes and links data frames in mc3_data3
mc3_data3$nodes <- remove_leading_underscores(mc3_data3$nodes)
mc3_data3$links <- remove_leading_underscores(mc3_data3$links)Verify changes
colnames(mc3_data3$nodes) [1] "type" "country" "ProductServices" "PointOfContact"
[5] "HeadOfOrg" "founding_date" "revenue" "TradeDescription"
[9] "last_edited_by" "last_edited_date" "date_added" "raw_source"
[13] "algorithm" "id" "dob"
colnames(mc3_data3$links) [1] "start_date" "type" "last_edited_by" "last_edited_date"
[5] "date_added" "raw_source" "algorithm" "source"
[9] "target" "key" "end_date"
Standardising Date Time Formats
In preparation for temporal analysis
# Create a copy of mc3_data3 and name it mc3_data4
mc3_data4 <- mc3_data3
# Convert date columns to Date-Time type
mc3_data4$nodes <- mc3_data4$nodes %>%
mutate(
founding_date = ymd_hms(founding_date),
last_edited_date = ymd_hms(last_edited_date),
date_added = ymd_hms(date_added),
dob = ymd_hms(dob)
)Warning: There was 1 warning in `mutate()`.
ℹ In argument: `dob = ymd_hms(dob)`.
Caused by warning:
! 176 failed to parse.
mc3_data4$links <- mc3_data4$links %>%
mutate(
start_date = ymd_hms(start_date),
last_edited_date = ymd_hms(last_edited_date),
date_added = ymd_hms(date_added),
end_date = ymd_hms(end_date)
)Warning: There was 1 warning in `mutate()`.
ℹ In argument: `start_date = ymd_hms(start_date)`.
Caused by warning:
! 14630 failed to parse.
The ymd_hms function is designed to work with character vectors and will return NA for any existing NA values. This means that any NA value in the original columns will remain NA after the conversion.
Verify changes
# View the first few rows of the date columns in nodes
head(mc3_data4$nodes %>% select(founding_date, last_edited_date, date_added, dob)) founding_date last_edited_date date_added dob
1 1954-04-24 2035-01-01 2035-01-01 <NA>
2 2009-06-12 2035-01-01 2035-01-01 <NA>
3 2029-12-15 2035-01-01 2035-01-01 <NA>
4 1972-02-16 2035-01-01 2035-01-01 <NA>
5 1954-04-06 2035-01-01 2035-01-01 <NA>
6 2031-09-30 2035-01-01 2035-01-01 <NA>
# View the first few rows of the date columns in links
head(mc3_data4$links %>% select(start_date)) start_date
1 2016-10-29
2 2035-06-03
3 2028-11-20
4 2024-09-04
5 2034-11-12
6 2007-04-06
# Summary of date columns in nodes
summary(mc3_data4$nodes %>% select(founding_date, last_edited_date, date_added, dob)) founding_date last_edited_date
Min. :1945-01-01 00:00:00.000 Min. :2035-01-01 00:00:00.0
1st Qu.:1968-01-11 00:00:00.000 1st Qu.:2035-01-01 00:00:00.0
Median :1991-07-03 00:00:00.000 Median :2035-01-01 00:00:00.0
Mean :1991-04-22 15:54:58.072 Mean :2035-01-02 10:34:13.4
3rd Qu.:2014-09-04 12:00:00.000 3rd Qu.:2035-01-01 00:00:00.0
Max. :2035-12-29 00:00:00.000 Max. :2036-01-15 00:00:00.0
NA's :51649
date_added dob
Min. :2035-01-01 00:00:00.0 Min. :1970-01-02 00:00:00.000
1st Qu.:2035-01-01 00:00:00.0 1st Qu.:1978-01-30 00:00:00.000
Median :2035-01-01 00:00:00.0 Median :1986-02-06 00:00:00.000
Mean :2035-01-02 10:28:32.2 Mean :1987-05-23 22:21:33.182
3rd Qu.:2035-01-01 00:00:00.0 3rd Qu.:1995-05-13 00:00:00.000
Max. :2036-01-15 00:00:00.0 Max. :2017-03-20 00:00:00.000
NA's :9047
# Summary of date columns in links
summary(mc3_data4$links %>% select(start_date)) start_date
Min. :1952-05-31 00:00:00.00
1st Qu.:2015-08-18 00:00:00.00
Median :2024-03-22 00:00:00.00
Mean :2022-11-23 10:50:43.11
3rd Qu.:2030-12-13 00:00:00.00
Max. :2035-12-29 00:00:00.00
NA's :14720
# Check the types of the date columns in nodes
str(mc3_data4$nodes %>% select(founding_date, last_edited_date, date_added, dob))'data.frame': 60520 obs. of 4 variables:
$ founding_date : POSIXct, format: "1954-04-24" "2009-06-12" ...
$ last_edited_date: POSIXct, format: "2035-01-01" "2035-01-01" ...
$ date_added : POSIXct, format: "2035-01-01" "2035-01-01" ...
$ dob : POSIXct, format: NA NA ...
# Check the types of the date columns in links
str(mc3_data4$links %>% select(start_date))'data.frame': 75817 obs. of 1 variable:
$ start_date: POSIXct, format: "2016-10-29" "2035-06-03" ...
view(mc3_data4[["nodes"]])
view(mc3_data4[["links"]])Split Words
The steps below will be used to split text in type column of nodes into two columns: namely type1 and type2.
# Make a copy of mc3_data4
mc3_data5 <- mc3_data4
# Split the type column into two columns
mc3_data5$nodes <- mc3_data5$nodes %>%
mutate(
type1 = sub("^(\\S+).*", "\\1", type),
type2 = sub("^\\S+\\.(.*)", "\\1", type)
)
# If there's only one word in type, set type2 to NA
mc3_data5$nodes$type2 <- ifelse(grepl("\\.", mc3_data5$nodes$type), mc3_data5$nodes$type2, NA)
# Remove the original 'type' column
mc3_data5$nodes <- mc3_data5$nodes %>%
select(-type)The steps below will be used to split text in type column of links into two columns: namely type1 and type2.
# Make a copy of mc3_data4
mc3_data6 <- mc3_data5
# Split the type column into two columns
# There are no special cases, exception left blank
mc3_data6$links <- mc3_data6$links %>%
mutate(
type1 = sub("(.*?\\..*?)(\\.[^.]+)?$", "\\1", type),
type2 = ifelse(grepl("\\.", type), sub(".*\\.", "", type), "")
)
# remove the original 'type' column
mc3_data6$links <- mc3_data6$links %>%
select(-type)Verify changes
# View the first few rows of the type columns in nodes
head(mc3_data6$nodes %>% select(type1,type2)) type1 type2
1 Entity.Organization.Company Company
2 Entity.Organization.Company Company
3 Entity.Organization.Company Company
4 Entity.Organization.Company Company
5 Entity.Organization.Company Company
6 Entity.Organization.Company Company
# View the first few rows of the type columns in links
head(mc3_data6$links %>% select(type1,type2)) type1 type2
1 Event.Owns Shareholdership
2 Event.Owns Shareholdership
3 Event.Owns Shareholdership
4 Event.Owns Shareholdership
5 Event.Owns Shareholdership
6 Event.Owns Shareholdership
view(mc3_data6[["nodes"]])
view(mc3_data6[["links"]])Extract Nodes
For Question 1
#keep only necessary columns
mc3_nodes_1 <- as_tibble(mc3_data6$nodes) %>%
select (-TradeDescription,
-last_edited_by,
-last_edited_date,
-algorithm,
-dob,
-type1)Save as rds file for future use
write_rds(mc3_nodes_1, "data/rds/mc3_nodes_1.rds")Extract Links
For Question 1
mc3_links_1 <- as_tibble(mc3_data6$links) %>%
select (-last_edited_by,
-last_edited_date,
-date_added,
-key,
-algorithm,
-type1,
-end_date)Save as rds file for future use
write_rds(mc3_links_1, "data/rds/mc3_links_1.rds")Load Data
Load rds file
Note: rds files can be loaded independently of the data wrangling steps above to save time
mc3_links_1 <- readRDS("data/rds/mc3_links_1.rds")
mc3_nodes_1 <- readRDS("data/rds/mc3_nodes_1.rds")Question 1
Changes in Corporate Structures Over Time
The plot shows how transaction volume changes over time, which helps identify periods of increased or decreased activity
transactions_over_time <- mc3_links_1 %>%
group_by(start_date) %>%
summarize(count = n()) %>%
drop_na()Number of Transactions over Time
Number of links can be used to determine transactions over time

ggplot(transactions_over_time, aes(x = start_date, y = count)) +
geom_line() +
labs(title = "Transactions Over Time", x = "Date", y = "Number of Transactions")The dataset spans from year 1952 to 2035.
We can see that from the start of the dataset until about year 2000, there were relatively few transactions. There was a small spike after year 2000, proceeded by exponential growth around 2005. However, there was a dip in transactions in 2035.
The dip could be due to effects after SouthSeafood Express Corp was caught for illegal behaviour and eventually closed in 2035.
Analysis should focus on transactions from year 2005 onwards. Data analysed should also be aggregated by year.
Filter data
Filter data to only keep transactions from 2000 (5 years before 2005) to 2035 (end of dataset). We keep some data that occurs before the start of our period of interest to capture any recent changes to entities.
# Filter the data frames to keep only data from the year 2000 and onwards
mc3_links_1_filtered <- mc3_links_1 %>%
filter(start_date >= as.Date("2000-01-01"))Aggregate Data by Year
# Extract year for aggregation
mc3_links_1_filtered2 <- mc3_links_1_filtered %>%
mutate(transaction_year = year(start_date))
# Calculate the number of transactions per year
yearly_txns <- mc3_links_1_filtered2 %>%
group_by(transaction_year) %>%
summarise(num_transactions = n())
# Plot the number of transactions per year
ggplot(yearly_txns, aes(x = transaction_year, y = num_transactions)) +
geom_line(color = "blue") +
labs(title = "Number of Transactions Per Year",
x = "Year",
y = "Number of Transactions") +
theme_minimal()It is now clearer that the rapid growth in transactions started around 2005, before reaching its peak at 2034 and sharply dropping in 2035, likely due to after effects of the SouthSeafood Express Corp incident.
Number of Active Companies Per Year
Drop na values
mc3_nodes2_1 <- mc3_nodes_1 %>%
drop_na(founding_date) # removes Persons and Persons CEONumber of nodes can be used to determine the number of active companies per year.
# Extract year for aggregation
mc3_nodes3_1 <- mc3_nodes2_1 %>%
mutate(active_year = floor_date(founding_date, "year"))
# Calculate the number of active companies per year
active_companies <- mc3_nodes3_1 %>%
group_by(active_year) %>%
summarise(num_active_companies = n())Summary
# Calculate the summary statistics
summary_stats <- summary(active_companies$num_active_companies)
summary_stats Min. 1st Qu. Median Mean 3rd Qu. Max.
79.00 88.50 96.00 97.48 104.00 136.00
# Extract and save the mean
# Round to 2 decimal places
mean_active_companies <- round(summary_stats["Mean"], 2)Plot graph

# Plot the number of active companies over time
ggplot(active_companies, aes(x = active_year, y = num_active_companies)) +
# line plot
geom_line(color = "darkgreen") +
labs(title = "Number of Active Companies Over Time",
x = "Date",
y = "Number of Active Companies") +
# mean line
geom_hline(aes(yintercept = mean_active_companies),
linetype = "dotted", color = "blue") +
annotate("text", x = min(active_companies$active_year),
y = mean_active_companies,
label = paste("Mean:", mean_active_companies),
hjust = 0, vjust = -1, color = "blue") +
theme_minimal()While there are fluctuations in the number of active companies over time, there is generally an increasing trend of the number of active companies over time, especially around 2010 onwards. This period shows a rising trend with the number of active companies reaching the highest values in the dataset. This is similar to that observed in the number of transactions over time, seen above.
We also see a dip around 2035, before the numbers increase again. Also likely due to the after effects of the SouthSeafood Express Corp incident.
Centrality Measures
Modifying network nodes and edges
Prepare the edges dataframe for network analysis by:
Ensuring all edges are unique.
Converting columns to a uniform type.
Calculating the weight of each edge (how many times each connection occurs).
Removing any self-loops.
mc3_edges <-
as_tibble(mc3_links_1_filtered2) %>%
distinct() %>%
mutate(source = as.character(source),
target = as.character(target),
type = as.character(type2),
tyear = as.integer(transaction_year)) %>%
group_by(source, target, type,tyear) %>%
summarise(weights = n()) %>%
filter(source != target) %>%
ungroup()`summarise()` has grouped output by 'source', 'target', 'type'. You can
override using the `.groups` argument.
The resulting mc3_edges tibble contains the columns source, target, type, year, and weights, where each row represents a unique edge between two nodes with a specific type, and the weights column represents the number of times that edge occurs.
Clean and preprocess the nodes data by:
Ensuring that each column has the correct data type for analysis.
Selecting only the necessary columns for further analysis or visualization.
mc3_nodes <- as_tibble(mc3_nodes_1) %>%
mutate(country = as.character(country),
id = as.character(id),
ProductServices = as.character(ProductServices),
revenue = as.numeric(as.character(revenue)),
type = as.character(type2)) %>%
select(id, country, type, revenue, ProductServices)The resulting mc3_nodes tibble contains the cleaned and correctly typed columns id, country, type, revenue, and ProductServices.
Keeping unique values
Edges
unique_transaction_types_edges <- mc3_edges %>%
select(type) %>%
distinct()
# Display the unique transaction types
print(unique_transaction_types_edges)# A tibble: 4 × 1
type
<chr>
1 Shareholdership
2 BeneficialOwnership
3 WorksFor
4 FamilyRelationship
There are 4 types of edges, namely:
Shareholdership
WorksFor
BeneficialOwnership
FamilyRelationship
Nodes
unique_transaction_types_nodes <- mc3_nodes %>%
select(type) %>%
distinct()
# Display the unique transaction types
print(unique_transaction_types_nodes)# A tibble: 8 × 1
type
<chr>
1 Company
2 LogisticsCompany
3 FishingCompany
4 FinancialCompany
5 NewsCompany
6 NGO
7 Person
8 CEO
There are 8 types of nodes, namely:
Company
LogisticsCompany
FishingCompany
FinancialCompany
NewsCompany
NGO
Person
CEO
Extract all the source and target nodes
Extract all the source and target nodes, then, drop any unmatched nodes
id1 <- mc3_edges %>%
select(source) %>%
rename(id = source)
id2 <- mc3_edges %>%
select(target) %>%
rename(id = target)
mc3_nodes1 <- rbind(id1, id2) %>%
distinct() %>%
left_join(mc3_nodes, by = c("id" = "id")) %>%
mutate(unmatched = "drop")Verify results
print(mc3_nodes1)# A tibble: 60,489 × 6
id country type revenue ProductServices unmatched
<chr> <chr> <chr> <dbl> <chr> <chr>
1 4. SeaCargo Ges.m.b.H. Oceanus Logisti… 23304. Tuna, sword fi… drop
2 9. RiverLine CJSC Oceanus Company 50134. Unknown drop
3 Aaron Acosta Mawalara Person 0 <NA> drop
4 Aaron Allen Galduzim Person 0 <NA> drop
5 Aaron Austin Kethilim Person 0 <NA> drop
6 Aaron Baker Azurionix Person 0 <NA> drop
7 Aaron Barry Kondanovia Person 0 <NA> drop
8 Aaron Bauer Rio Solovia Person 0 <NA> drop
9 Aaron Bishop Osterivaro Person 0 <NA> drop
10 Aaron Bolton n.a. Person 0 <NA> drop
# ℹ 60,479 more rows
Create Graph Object
Create graph object and calculate centrality measures
mc3_graph <- tbl_graph(nodes = mc3_nodes1, edges = mc3_edges, directed = TRUE) %>%
mutate(betweenness_centrality = centrality_betweenness(),
closeness_centrality = centrality_closeness())
theme_graph()
## List of 136
## $ line :List of 6
## ..$ colour : chr "black"
## ..$ linewidth : num 0.5
## ..$ linetype : num 1
## ..$ lineend : chr "butt"
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ rect :List of 5
## ..$ fill : chr "white"
## ..$ colour : chr "black"
## ..$ linewidth : num 0.5
## ..$ linetype : num 1
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ text :List of 11
## ..$ family : chr "Arial Narrow"
## ..$ face : chr "plain"
## ..$ colour : chr "black"
## ..$ size : num 11
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : num 0
## ..$ lineheight : num 0.9
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : logi FALSE
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ title : NULL
## $ aspect.ratio : NULL
## $ axis.title : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.title.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.75points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.75points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.bottom : NULL
## $ axis.title.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : num 90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.75points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.y.left : NULL
## $ axis.title.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : num -90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.75points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.text.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.2points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.2points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.bottom : NULL
## $ axis.text.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 1
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.2points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.y.left : NULL
## $ axis.text.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.2points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.theta : NULL
## $ axis.text.r :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0.5
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.2points 0points 2.2points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.ticks : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.ticks.x : NULL
## $ axis.ticks.x.top : NULL
## $ axis.ticks.x.bottom : NULL
## $ axis.ticks.y : NULL
## $ axis.ticks.y.left : NULL
## $ axis.ticks.y.right : NULL
## $ axis.ticks.theta : NULL
## $ axis.ticks.r : NULL
## $ axis.minor.ticks.x.top : NULL
## $ axis.minor.ticks.x.bottom : NULL
## $ axis.minor.ticks.y.left : NULL
## $ axis.minor.ticks.y.right : NULL
## $ axis.minor.ticks.theta : NULL
## $ axis.minor.ticks.r : NULL
## $ axis.ticks.length : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ axis.ticks.length.x : NULL
## $ axis.ticks.length.x.top : NULL
## $ axis.ticks.length.x.bottom : NULL
## $ axis.ticks.length.y : NULL
## $ axis.ticks.length.y.left : NULL
## $ axis.ticks.length.y.right : NULL
## $ axis.ticks.length.theta : NULL
## $ axis.ticks.length.r : NULL
## $ axis.minor.ticks.length : 'rel' num 0.75
## $ axis.minor.ticks.length.x : NULL
## $ axis.minor.ticks.length.x.top : NULL
## $ axis.minor.ticks.length.x.bottom: NULL
## $ axis.minor.ticks.length.y : NULL
## $ axis.minor.ticks.length.y.left : NULL
## $ axis.minor.ticks.length.y.right : NULL
## $ axis.minor.ticks.length.theta : NULL
## $ axis.minor.ticks.length.r : NULL
## $ axis.line : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.line.x : NULL
## $ axis.line.x.top : NULL
## $ axis.line.x.bottom : NULL
## $ axis.line.y : NULL
## $ axis.line.y.left : NULL
## $ axis.line.y.right : NULL
## $ axis.line.theta : NULL
## $ axis.line.r : NULL
## $ legend.background : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
## ..- attr(*, "unit")= int 8
## $ legend.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## $ legend.spacing.x : NULL
## $ legend.spacing.y : NULL
## $ legend.key : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.key.size : 'simpleUnit' num 1.2lines
## ..- attr(*, "unit")= int 3
## $ legend.key.height : NULL
## $ legend.key.width : NULL
## $ legend.key.spacing : 'simpleUnit' num 5.5points
## ..- attr(*, "unit")= int 8
## $ legend.key.spacing.x : NULL
## $ legend.key.spacing.y : NULL
## $ legend.frame : NULL
## $ legend.ticks : NULL
## $ legend.ticks.length : 'rel' num 0.2
## $ legend.axis.line : NULL
## $ legend.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.text.position : NULL
## $ legend.title :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.title.position : NULL
## $ legend.position : chr "right"
## $ legend.position.inside : NULL
## $ legend.direction : NULL
## $ legend.byrow : NULL
## $ legend.justification : chr "center"
## $ legend.justification.top : NULL
## $ legend.justification.bottom : NULL
## $ legend.justification.left : NULL
## $ legend.justification.right : NULL
## $ legend.justification.inside : NULL
## $ legend.location : NULL
## $ legend.box : NULL
## $ legend.box.just : NULL
## $ legend.box.margin : 'margin' num [1:4] 0cm 0cm 0cm 0cm
## ..- attr(*, "unit")= int 1
## $ legend.box.background : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.box.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## [list output truncated]
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi TRUE
## - attr(*, "validate")= logi TRUENetwork Graph
Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
ℹ Please use the `transform` argument instead.

# Display the network graph
ggraph(mc3_graph, layout = "fr") + # Using Fruchterman-Reingold layout
geom_edge_link(aes(edge_alpha = 0.8, edge_width = 0.8)) + # Customize edge appearance
geom_node_point(aes(size = betweenness_centrality, color = closeness_centrality)) + # Customize node appearance
scale_color_viridis_c() + # Use viridis color scale
theme_void() + # Use a void theme
labs(title = "Refined Network Graph of Atypical Business Transactions",
subtitle = "Nodes colored by closeness centrality and sized by betweenness centrality",
caption = "Data Source: mc3.json") # Add titles and captionsThe grey circular portion in the center of the network graph does not represent any specific data or entities. It is a visual byproduct resulting from the dense clustering of nodes and edges in that central region. This effect is particularly noticeable in dense, highly interconnected network visualizations where nodes and edges are concentrated in a small space.
Thus, we filter the nodes to refine the graph.
Refined Network Graph
Top Nodes
Identify top 20 nodes by betweenness centrality
# Identify top nodes by betweenness centrality
top_nodes <- mc3_graph %>%
as_tibble()
# Identify top 20 nodes by betweenness centrality
top_nodes2 <- top_nodes %>%
top_n(20, wt = betweenness_centrality)List of most active people and businesses
top_nodes2# A tibble: 21 × 8
id country type revenue ProductServices unmatched betweenness_centrality
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 Corte… Mawala… Comp… 6.99e3 Finish carpent… drop 22
2 Evans… Oceanus Fish… 5.50e4 Processing and… drop 29
3 Fried… Mawand… Comp… 1.64e4 Grocery produc… drop 38
4 Gvard… Nalaki… Comp… 6.85e4 Shipping servi… drop 33
5 Hill … Oceanus Comp… 4.75e3 Unknown drop 31
6 Howel… Mawand… Comp… 7.74e6 High-grade met… drop 54
7 Johns… Valtal… Comp… 3.35e4 Machinery and … drop 33
8 Kaise… Isla S… Comp… 2.32e4 Canned and cur… drop 26
9 King … Oceanus Comp… 0 Operation of i… drop 29
10 Lane … Imazam Fish… 4.80e3 Fish and seafo… drop 33
# ℹ 11 more rows
# ℹ 1 more variable: closeness_centrality <dbl>
It is likely that these entities on the top 10 list are big players in the industry and control information and resources.
High betweenness centrality means that a node plays a more crucial role in connecting other nodes. It can be an indicator of:
Brokerage Role: Nodes with high betweenness centrality often act as bridges or intermediaries between different parts of the network. They control the flow of information, resources, or interactions between other nodes.
Control and Influence: Nodes with high betweenness centrality have the potential to control the flow of information or resources in the network. They may have more influence or power over the network dynamics compared to other nodes.
Plot refined graph

# Extract IDs of top nodes
# Extract IDs of top nodes
top_node_ids <- top_nodes$id
# Filter the graph to include only top nodes and their incident edges
top_graph <- mc3_graph %>%
activate(nodes) %>%
filter(id %in% top_node_ids) %>%
activate(edges) %>%
filter(edge_is_incident(top_node_ids))
# Plot the network graph with top nodes
ggraph(top_graph, layout = "fr") +
geom_edge_link(aes(edge_alpha = 0.1, edge_width = 0.1)) +
geom_node_point(aes(size = betweenness_centrality, color = closeness_centrality)) +
scale_color_viridis_c() +
theme_void() +
labs(title = "Top 20 Nodes Network Graph",
subtitle = "Nodes colored by closeness centrality and sized by betweenness centrality",
caption = "Data Source: mc3.json")We can see that for the top nodes, they are highly interconnected. To the extent that the graph becomes less interpretable.
Question 2
Centrality Over Time
Plot Graph
# Assuming you have a function to calculate centrality for each year
calculate_centrality_over_time <- function(nodes, edges, time_unit = "year")
{
edges <- edges %>%
mutate(period = as.Date(paste0(tyear, "-01-01")))
centrality_results <- edges %>%
group_by(period) %>%
do({
current_edges <- .
current_nodes <- nodes %>% filter(id %in% unique(c(current_edges$source, current_edges$target)))
graph <- tbl_graph(nodes = current_nodes, edges = current_edges, directed = TRUE)
graph %>%
mutate(betweenness = centrality_betweenness()) %>%
as_tibble() %>%
summarise(mean_betweenness = mean(betweenness, na.rm = TRUE))
}) %>%
ungroup()
return(centrality_results)
}
# Calculate centrality measures over time
centrality_over_time <- calculate_centrality_over_time(mc3_nodes1, mc3_edges)
# Plot centrality measures over time
ggplot(centrality_over_time, aes(x = period, y = mean_betweenness)) +
geom_line(color = "red") +
labs(title = "Average Betweenness Centrality Per Year",
x = "Year",
y = "Mean Betweenness Centrality") +
theme_minimal()
This graph shows that there is a increasing trend of the average betweenness centrality per year over time.
From around 2000 to 2020, the average betweenness centrality remains relatively low and fluctuates within a small range, indicating a stable network structure during this period. However, starting around 2025, there is a sharp and dramatic increase in the average betweenness centrality.
This sudden rise suggests a significant change in the network dynamics, where certain nodes or entities are becoming increasingly important as bridges or intermediaries connecting different parts of the network. Such a drastic increase could potentially indicate the emergence of new influential players, changes in transaction patterns, or the formation of new connections and pathways within the network.
The rapid growth in average betweenness centrality implies that the network structure is becoming more centralized, with a smaller number of nodes acting as critical hubs or gatekeepers, controlling the flow of information or transactions within the network.
Key Influencers
Extract key influencers and their edges
# Filter mc3_edges to keep rows where source ID is in top_nodes2
keypersonnel <- mc3_edges %>%
filter(source %in% top_nodes2$id)Key influencers of the industry
unique(keypersonnel$source) [1] "Cortez LLC" "Evans-Pearson"
[3] "Friedman, Gibson and Garcia" "GvardeyskAmerica Shipping Plc"
[5] "Hill PLC" "Howell LLC"
[7] "Johnson, Perez and Salinas" "Kaiser, Warren and Shepard"
[9] "King and Sons" "Lane Group"
[11] "Lee-Ramirez" "Mcpherson-Wright"
[13] "NamRiver Transit A/S" "Osborne, Saunders and Brown"
[15] "Patel-Miller" "Ramos, Jordan and Stewart"
[17] "Rivera, Lee and Carroll" "Russell and Sons"
[19] "Stein, Taylor and Williams" "StichtingMarine Shipping Company"
[21] "Vasquez-Gonzalez"
The key influencers are:
Cortez LLC
Evans-Pearson
Friedman, Gibson and Garcia
GvardeyskAmerica Shipping Plc
Hill PLC
Howell LLC
Johnson, Perez and Salinas
Kaiser, Warren and Shepard
King and Sons
Lane Group
Lee-Ramirez
Mcpherson-Wright
NamRiver Transit A/S
Osborne, Saunders and Brown
Patel-Miller
Ramos, Jordan and Stewart
Rivera, Lee and Carroll Russell and Sons
Stein, Taylor and Williams
StichtingMarine Shipping Company
Vasquez-Gonzalez
Relationship between influencers and their links
unique(keypersonnel$type)[1] "Shareholdership"
There is only 1 type of relationship between the influencers and their links. The influencers are shareholders of those they are linked to.
Network Graph
Create Graph Object
# Create a nodes dataframe from the unique source and target values
nodes <- unique(c(keypersonnel$source, keypersonnel$target)) %>%
data.frame(name = .)
# Create the graph object using tbl_graph
graph_data <- tbl_graph(nodes = nodes,
edges = keypersonnel %>%
rename(from = source, to = target),
directed = TRUE)Plot Graph

# Plot the directed graph
ggraph(graph_data, layout = "fr") + # Using Fruchterman-Reingold layout
geom_edge_link(aes(label = as.character(tyear)), # Only label with tyear
arrow = arrow(length = unit(4, 'mm')), # Add arrows to indicate direction
end_cap = circle(3, 'mm'), # Cap the end of the edges with a circle
label_dodge = unit(2, "mm"), # Adjust label position to avoid overlap
label_size = 3, # Set label size
edge_width = 0.8, # Set edge width
edge_alpha = 0.8) + # Set edge transparency
geom_node_point(size = 5, color = "blue") + # Customize node appearance
geom_node_text(aes(label = name), vjust = 1.5, size = 4) + # Add node labels
theme_void() + # Use a void theme
labs(title = "Directed Network Graph of Key Personnel Transactions",
subtitle = "Nodes represent unique sources and targets, edges labeled with year",
caption = "Data Source: keypersonnel") # Add titles and captionsMost number of links:
| Entity Name | Number of Links |
|---|---|
| GvardeyskAmerica Shipping Plc | 4 |
| Rivera, Lee and Carroll | 3 |
| Cortez LLC | 2 |
| Kaiser, Warren and Shepard | 2 |
| Mcpherson-Wright | 2 |
| Patel-Miller | 2 |
| StichtingMarine Shipping Company | 2 |
| Vasquez-Gonzalez | 2 |
The earliest link: Lane Group has been the shareholder of Howell LLC since 2020.
The most recent link: GvardeyskAmerica Shipping Plc is the shareholder of ArawakFish Cargo Ges.m.b.H.. since 2034.
Finding atypical Business Relationships
# Creating the graph object
mc3_graph1 <- tbl_graph(nodes = mc3_nodes1, edges = mc3_edges, directed = FALSE) %>%
mutate(betweenness_centrality = centrality_betweenness(),
closeness_centrality = centrality_closeness())
theme_graph()List of 136
$ line :List of 6
..$ colour : chr "black"
..$ linewidth : num 0.5
..$ linetype : num 1
..$ lineend : chr "butt"
..$ arrow : logi FALSE
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_line" "element"
$ rect :List of 5
..$ fill : chr "white"
..$ colour : chr "black"
..$ linewidth : num 0.5
..$ linetype : num 1
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_rect" "element"
$ text :List of 11
..$ family : chr "Arial Narrow"
..$ face : chr "plain"
..$ colour : chr "black"
..$ size : num 11
..$ hjust : num 0.5
..$ vjust : num 0.5
..$ angle : num 0
..$ lineheight : num 0.9
..$ margin : 'margin' num [1:4] 0points 0points 0points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : logi FALSE
..$ inherit.blank: logi FALSE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ title : NULL
$ aspect.ratio : NULL
$ axis.title : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ axis.title.x :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : num 1
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 2.75points 0points 0points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.title.x.top :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : num 0
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 0points 0points 2.75points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.title.x.bottom : NULL
$ axis.title.y :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : num 1
..$ angle : num 90
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 0points 2.75points 0points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.title.y.left : NULL
$ axis.title.y.right :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : num 1
..$ angle : num -90
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 0points 0points 0points 2.75points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.text : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ axis.text.x :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : num 1
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 2.2points 0points 0points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.text.x.top :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : num 0
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 0points 0points 2.2points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.text.x.bottom : NULL
$ axis.text.y :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : num 1
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 0points 2.2points 0points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.text.y.left : NULL
$ axis.text.y.right :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : num 0
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 0points 0points 0points 2.2points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.text.theta : NULL
$ axis.text.r :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : num 0.5
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 0points 2.2points 0points 2.2points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.ticks : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ axis.ticks.x : NULL
$ axis.ticks.x.top : NULL
$ axis.ticks.x.bottom : NULL
$ axis.ticks.y : NULL
$ axis.ticks.y.left : NULL
$ axis.ticks.y.right : NULL
$ axis.ticks.theta : NULL
$ axis.ticks.r : NULL
$ axis.minor.ticks.x.top : NULL
$ axis.minor.ticks.x.bottom : NULL
$ axis.minor.ticks.y.left : NULL
$ axis.minor.ticks.y.right : NULL
$ axis.minor.ticks.theta : NULL
$ axis.minor.ticks.r : NULL
$ axis.ticks.length : 'simpleUnit' num 2.75points
..- attr(*, "unit")= int 8
$ axis.ticks.length.x : NULL
$ axis.ticks.length.x.top : NULL
$ axis.ticks.length.x.bottom : NULL
$ axis.ticks.length.y : NULL
$ axis.ticks.length.y.left : NULL
$ axis.ticks.length.y.right : NULL
$ axis.ticks.length.theta : NULL
$ axis.ticks.length.r : NULL
$ axis.minor.ticks.length : 'rel' num 0.75
$ axis.minor.ticks.length.x : NULL
$ axis.minor.ticks.length.x.top : NULL
$ axis.minor.ticks.length.x.bottom: NULL
$ axis.minor.ticks.length.y : NULL
$ axis.minor.ticks.length.y.left : NULL
$ axis.minor.ticks.length.y.right : NULL
$ axis.minor.ticks.length.theta : NULL
$ axis.minor.ticks.length.r : NULL
$ axis.line : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ axis.line.x : NULL
$ axis.line.x.top : NULL
$ axis.line.x.bottom : NULL
$ axis.line.y : NULL
$ axis.line.y.left : NULL
$ axis.line.y.right : NULL
$ axis.line.theta : NULL
$ axis.line.r : NULL
$ legend.background : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ legend.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
..- attr(*, "unit")= int 8
$ legend.spacing : 'simpleUnit' num 11points
..- attr(*, "unit")= int 8
$ legend.spacing.x : NULL
$ legend.spacing.y : NULL
$ legend.key : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ legend.key.size : 'simpleUnit' num 1.2lines
..- attr(*, "unit")= int 3
$ legend.key.height : NULL
$ legend.key.width : NULL
$ legend.key.spacing : 'simpleUnit' num 5.5points
..- attr(*, "unit")= int 8
$ legend.key.spacing.x : NULL
$ legend.key.spacing.y : NULL
$ legend.frame : NULL
$ legend.ticks : NULL
$ legend.ticks.length : 'rel' num 0.2
$ legend.axis.line : NULL
$ legend.text :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : 'rel' num 0.8
..$ hjust : NULL
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ legend.text.position : NULL
$ legend.title :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : num 0
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ legend.title.position : NULL
$ legend.position : chr "right"
$ legend.position.inside : NULL
$ legend.direction : NULL
$ legend.byrow : NULL
$ legend.justification : chr "center"
$ legend.justification.top : NULL
$ legend.justification.bottom : NULL
$ legend.justification.left : NULL
$ legend.justification.right : NULL
$ legend.justification.inside : NULL
$ legend.location : NULL
$ legend.box : NULL
$ legend.box.just : NULL
$ legend.box.margin : 'margin' num [1:4] 0cm 0cm 0cm 0cm
..- attr(*, "unit")= int 1
$ legend.box.background : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ legend.box.spacing : 'simpleUnit' num 11points
..- attr(*, "unit")= int 8
[list output truncated]
- attr(*, "class")= chr [1:2] "theme" "gg"
- attr(*, "complete")= logi TRUE
- attr(*, "validate")= logi TRUE
# Identify top nodes by betweenness centrality
top_nodes <- mc3_graph1 %>%
as_tibble() %>%
filter(betweenness_centrality >= 3000000)head(top_nodes, n = 50)# A tibble: 50 × 8
id country type revenue ProductServices unmatched betweenness_centrality
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 Aaron… n.a. Pers… 0 <NA> drop 7756175.
2 Alan … n.a. Pers… 0 <NA> drop 7756175.
3 Aleja… Valtal… Pers… 0 <NA> drop 7231016.
4 Alex … Arvaros Pers… 0 <NA> drop 3442208
5 Alexa… n.a. Pers… 0 <NA> drop 7756175.
6 Amand… Mawala… Pers… 0 <NA> drop 7944565.
7 Amede… Novarc… Pers… 0 <NA> drop 4211301.
8 Amy A… Ariuzi… CEO 0 <NA> drop 3272037.
9 Amy B… Kondan… Pers… 0 <NA> drop 4044917.
10 Andre… Kondan… Pers… 0 <NA> drop 3914655.
# ℹ 40 more rows
# ℹ 1 more variable: closeness_centrality <dbl>
# Filter edges for atypical business transactions
atypical_edges <- mc3_edges %>%
filter(type %in% c("Shareholdership", "WorksFor", "BeneficialOwnership", "FamilyRelationship"))
# Extract nodes that are part of these transactions
atypical_nodes <- mc3_nodes %>%
filter(id %in% unique(c(atypical_edges$source, atypical_edges$target)))
# Create the graph object with filtered data
atypical_graph <- tbl_graph(nodes = atypical_nodes, edges = atypical_edges, directed = FALSE) %>%
mutate(betweenness_centrality = centrality_betweenness(),
closeness_centrality = centrality_closeness())
# Define higher centrality thresholds for more filtering
betweenness_threshold <- quantile(atypical_graph %>% activate(nodes) %>% pull(betweenness_centrality), 0.995)
closeness_threshold <- quantile(atypical_graph %>% activate(nodes) %>% pull(closeness_centrality), 0.995)
# Filter nodes based on higher centrality thresholds
filtered_graph <- atypical_graph %>%
activate(nodes) %>%
filter(betweenness_centrality >= betweenness_threshold | closeness_centrality >= closeness_threshold) %>%
activate(edges) %>%
filter(edge_is_between())
# Display the refined network graph
ggraph(filtered_graph, layout = "fr") + # Using Fruchterman-Reingold layout
geom_edge_link(aes(edge_alpha = 0.8, edge_width = 0.8)) + # Customize edge appearance
geom_node_point(aes(size = betweenness_centrality, color = closeness_centrality)) + # Customize node appearance
scale_color_viridis_c() + # Use viridis color scale
theme_void() + # Use a void theme
labs(title = "Refined Network Graph of Atypical Business Transactions",
subtitle = "Nodes colored by closeness centrality and sized by betweenness centrality",
caption = "Data Source: mc3.json") # Add titles and captions
Step 2 filtering out false positives with a centrality threshold
# Define higher centrality thresholds for more filtering
betweenness_threshold <- quantile(atypical_graph %>% activate(nodes) %>% pull(betweenness_centrality), 0.995)
closeness_threshold <- quantile(atypical_graph %>% activate(nodes) %>% pull(closeness_centrality), 0.995)
# Filter nodes based on higher centrality thresholds
filtered_graph <- atypical_graph %>%
activate(nodes) %>%
filter(betweenness_centrality >= betweenness_threshold | closeness_centrality >= closeness_threshold) %>%
activate(edges) %>%
filter(edge_is_between())
# Verify the filtered graph object
print(filtered_graph)# A tbl_graph: 2557 nodes and 2117 edges
#
# A bipartite multigraph with 1146 components
#
# Edge Data: 2,117 × 5 (active)
from to type tyear weights
<int> <int> <chr> <int> <int>
1 3 1199 BeneficialOwnership 2019 1
2 25 1199 BeneficialOwnership 2021 1
3 29 1199 BeneficialOwnership 2019 1
4 148 1199 BeneficialOwnership 2008 1
5 150 1199 BeneficialOwnership 2029 1
6 175 1199 BeneficialOwnership 2012 1
7 229 1199 BeneficialOwnership 2031 1
8 316 1199 BeneficialOwnership 2034 1
9 448 1199 BeneficialOwnership 2021 1
10 479 1199 BeneficialOwnership 2019 1
# ℹ 2,107 more rows
#
# Node Data: 2,557 × 7
id country type revenue ProductServices betweenness_centrality
<chr> <chr> <chr> <dbl> <chr> <dbl>
1 Abbott-Harrison Uzifri… Comp… 0 Food products 24317375.
2 Adams, Hernandez… Rio Is… Comp… 0 Unknown 0
3 Adams-Byrd Nalako… Comp… 147540. Offers a wide … 4730169.
# ℹ 2,554 more rows
# ℹ 1 more variable: closeness_centrality <dbl>
# Display the refined network graph
ggraph(filtered_graph, layout = "fr") + # Using Fruchterman-Reingold layout
geom_edge_link(aes(edge_alpha = 0.8, edge_width = 0.8)) + # Customize edge appearance
geom_node_point(aes(size = betweenness_centrality, color = closeness_centrality)) + # Customize node appearance
scale_color_viridis_c() + # Use viridis color scale
theme_void() + # Use a void theme
labs(title = "Refined Network Graph of Atypical Business Transactions",
subtitle = "Nodes colored by closeness centrality and sized by betweenness centrality",
caption = "Data Source: mc3.json") # Add titles and captions
Refining visualisation
We first will need to sift out edges with some form of ownership or working relationship. Following that, the centrality threshold will need to be defined so that most false positives are being filtered out through it and then finally using slice to only take the top 50 edges.
Step 3, take only the top 50 links
# Get top 50 links based on the sum of betweenness centrality of source and target nodes
top_50_links <- filtered_graph %>%
activate(edges) %>%
mutate(edge_betweenness_sum = .N()$betweenness_centrality[from] + .N()$betweenness_centrality[to]) %>%
arrange(desc(edge_betweenness_sum)) %>%
slice(1:50)
# Create a new graph object with the top 50 links
top_50_graph <- tbl_graph(nodes = filtered_graph %>% activate(nodes), edges = top_50_links, directed = FALSE)
# Identify the top nodes for labeling
top_nodes <- top_50_graph %>%
activate(nodes) %>%
as_tibble() %>%
arrange(desc(betweenness_centrality)) %>%
slice(1:10)
# Extract layout data for node positions
graph_layout <- create_layout(top_50_graph, layout = "fr")
# Display the refined network graph with top 50 links and node labels
ggraph(graph_layout) + # Using precomputed layout
geom_edge_link(aes(edge_alpha = 0.8, edge_width = 0.8), color = "gray") + # Customize edge appearance
geom_node_point(aes(size = betweenness_centrality, color = closeness_centrality)) + # Customize node appearance
geom_node_text(data = graph_layout, aes(x = x, y = y, label = id), repel = TRUE, size = 3, check_overlap = TRUE) + # Add labels to top nodes
scale_color_viridis_c() + # Use viridis color scale
theme_void() + # Use a void theme
labs(title = "Top 50 Links in Atypical Business Transactions",
subtitle = "Nodes colored by closeness centrality and sized by betweenness centrality",
caption = "Data Source: mc3.json") # Add titles and captionsWarning: ggrepel: 2547 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

# Extract the top 50 links based on the sum of betweenness centrality
top_50_links <- filtered_graph %>%
activate(edges) %>%
mutate(edge_betweenness_sum = .N()$betweenness_centrality[from] + .N()$betweenness_centrality[to]) %>%
arrange(desc(edge_betweenness_sum)) %>%
slice(1:50) %>%
as_tibble()
# Create a data frame with the relevant information
top_50_links_df <- top_50_links %>%
select(from, to, edge_betweenness_sum)Interpretation
Node Size (Betweenness Centrality):
- Larger nodes represent entities with higher betweenness centrality. These are nodes that frequently act as intermediaries in the shortest paths between other nodes. They are crucial for the flow of information or transactions in the network.
Node Color (Closeness Centrality):
- The color of the nodes indicates their closeness centrality. Closeness centrality measures how close a node is to all other nodes in the network. Nodes with higher closeness centrality (often lighter colors) can quickly interact with other nodes.
Labeled Nodes:
- The labels represent the most central nodes based on betweenness centrality. These are key entities in the network, acting as significant intermediaries in business transactions.
Node Distribution:
- The dense cluster of nodes in the center indicates a high level of interaction among these entities. Peripheral nodes might indicate entities that are less central but still part of significant transactions.
Observations
Highly Central Entities:
- Entities such as “Augustin Le Texier,” “Sullivan and Sons,” “Tullio Jacuzzi,” and others labeled on the graph are highly central in terms of their ability to broker connections between other nodes.
Dense Core:
- The central area of the graph is densely packed with nodes, indicating a high degree of interconnection among many entities. This suggests a tightly-knit network where many transactions or interactions occur.
Peripheral Nodes:
- Nodes on the periphery, though smaller and less central, still play a role in the network. Their interactions may be with the core or other peripheral nodes.
Potential Actions
Focus on Key Players:
- Entities with high betweenness and closeness centrality (large, brightly colored nodes) are critical for network connectivity. These entities might be key influencers or major players in business transactions.
Investigate Clusters:
- The dense central cluster indicates a closely connected group of entities. Investigating these clusters can reveal sub-networks or communities within the larger network.
Cluster Investigation
# Perform clustering on the graph
clustered_graph <- mc3_graph1 %>%
mutate(cluster = as.factor(group_louvain()))
# Calculate edge betweenness centrality for the entire graph
edge_betweenness_vals <- edge_betweenness(clustered_graph)
# Add edge betweenness centrality to the graph
clustered_graph <- clustered_graph %>%
activate(edges) %>%
mutate(edge_betweenness = edge_betweenness_vals)
# Highlight key nodes with betweenness centrality >= 3,000,000 and their clusters
key_nodes_and_clusters <- clustered_graph %>%
activate(nodes) %>%
filter(betweenness_centrality >= 3000000) %>%
pull(cluster) %>%
unique()
# Filter the graph to include only the key nodes and their clusters
filtered_graph <- clustered_graph %>%
activate(nodes) %>%
filter(cluster %in% key_nodes_and_clusters)
# Calculate edge betweenness sum for the filtered graph
filtered_graph <- filtered_graph %>%
activate(edges) %>%
mutate(edge_betweenness_sum = .N()$betweenness_centrality[from] + .N()$betweenness_centrality[to])
# Extract layout data for node positions
graph_layout <- create_layout(filtered_graph, layout = "fr")
# Enhanced plot with labels, colors, and improved legend
p <- ggraph(graph_layout) + # Using precomputed layout
geom_edge_link(aes(width = edge_betweenness_sum / max(edge_betweenness_sum),
alpha = edge_betweenness_sum / max(edge_betweenness_sum),
tooltip = edge_betweenness_sum),
color = "gray") + # Customize edge appearance
geom_node_point(aes(size = betweenness_centrality, color = cluster,
alpha = ifelse(betweenness_centrality >= 3000000, 1, 0.4),
tooltip = paste("ID:", id, "<br>Cluster:", cluster, "<br>Betweenness:", betweenness_centrality))) + # Add tooltip information
geom_node_text(aes(label = ifelse(betweenness_centrality >= 3000000, id, "")),
vjust = 1.5, hjust = 1.5, check_overlap = TRUE) +
scale_size_continuous(range = c(1, 10)) +
scale_color_manual(values = c("1" = "blue", "2" = "green", "3" = "red", "4" = "yellow", "5" = "purple", "6" = "orange", "7" = "pink", "8" = "cyan")) + # Customize based on the number of clusters
guides(edge_alpha = guide_legend(title = "Edge Alpha"),
size = guide_legend(title = "Betweenness Centrality"),
color = guide_legend(title = "Cluster")) +
theme_graph() +
theme(legend.position = "bottom") +
labs(title = "Enhanced Company Network Visualization",
subtitle = "Key Nodes Highlighted with Betweenness Centrality >= 3,000,000")Warning in geom_edge_link(aes(width =
edge_betweenness_sum/max(edge_betweenness_sum), : Ignoring unknown aesthetics:
tooltip
Warning in geom_node_point(aes(size = betweenness_centrality, color = cluster,
: Ignoring unknown aesthetics: tooltip
# Convert ggraph plot to plotly object
p_plotly <- ggplotly(p, tooltip = "tooltip")Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomEdgePath() has yet to be implemented in plotly.
If you'd like to see this geom implemented,
Please open an issue with your example code at
https://github.com/ropensci/plotly/issues
# Display the plotly interactive plot
p_plotlyQuestion 3
By analyzing the ownership structure, we tracked changes in most influential individuals (VIP) networks over time, identifying key individuals with increasing influence.
Part 1: Data Wrangling
Split the nodes into people and companies, and filter ownership-related edges
# Select crucial columns and fill missing values where appropriate
cleaned_nodes <- mc3_data[["nodes"]] %>%
select(id, type, country, HeadOfOrg, revenue,ProductServices,PointOfContact,founding_date,TradeDescription,dob,
`_last_edited_by`, `_last_edited_date`, `_date_added`, `_raw_source`, `_algorithm`) %>%
mutate(HeadOfOrg = ifelse(is.na(HeadOfOrg), "Unknown", HeadOfOrg),
revenue = ifelse(is.na(revenue), 0, revenue))
# Handle missing values in links
# Select crucial columns and fill missing values where appropriate
cleaned_links <- mc3_data[["links"]] %>%
select(key,source, target, type, start_date, end_date, `_last_edited_by`, `_last_edited_date`, `_date_added`, `_raw_source`, `_algorithm`) %>%
mutate(start_date = ifelse(is.na(start_date), "Unknown", start_date),
end_date = ifelse(is.na(end_date), "Unknown", end_date))
# Ensure proper data types
cleaned_nodes <- cleaned_nodes %>%
mutate(
id = as.character(id),
type = as.character(type),
country = as.character(country),
HeadOfOrg = as.character(HeadOfOrg),
revenue = as.numeric(revenue),
`_last_edited_by` = as.character(`_last_edited_by`),
`_last_edited_date` = as.character(`_last_edited_date`),
`_date_added` = as.character(`_date_added`),
`_raw_source` = as.character(`_raw_source`),
`_algorithm` = as.character(`_algorithm`)
)
cleaned_links <- cleaned_links %>%
mutate(
source = as.character(source),
target = as.character(target),
type = as.character(type),
start_date = as.character(start_date),
end_date = as.character(end_date),
`_last_edited_by` = as.character(`_last_edited_by`),
`_last_edited_date` = as.character(`_last_edited_date`),
`_date_added` = as.character(`_date_added`),
`_raw_source` = as.character(`_raw_source`),
`_algorithm` = as.character(`_algorithm`)
)Check for data types
# Ensure correct data types for nodes
cleaned_nodes <- cleaned_nodes %>%
mutate(
id = as.character(id),
type = as.character(type),
country = as.character(country),
HeadOfOrg = as.character(HeadOfOrg),
revenue = as.numeric(revenue),
dob = as.POSIXct(dob, format="%Y-%m-%dT%H:%M:%S"),
`_last_edited_by` = as.character(`_last_edited_by`),
`_last_edited_date` = as.POSIXct(`_last_edited_date`, format="%Y-%m-%dT%H:%M:%S"),
founding_date=as.POSIXct(founding_date, format="%Y-%m-%dT%H:%M:%S"),
`_date_added` = as.POSIXct(`_date_added`, format="%Y-%m-%dT%H:%M:%S"),
`_raw_source` = as.character(`_raw_source`),
`_algorithm` = as.character(`_algorithm`)
)
# Ensure correct data types for links
cleaned_links <- cleaned_links %>%
mutate(
source = as.character(source),
target = as.character(target),
type = as.character(type),
start_date = as.POSIXct(start_date, format="%Y-%m-%dT%H:%M:%S"),
end_date = as.POSIXct(end_date, format="%Y-%m-%dT%H:%M:%S"),
`_last_edited_by` = as.character(`_last_edited_by`),
`_last_edited_date` = as.POSIXct(`_last_edited_date`, format="%Y-%m-%dT%H:%M:%S"),
`_date_added` = as.POSIXct(`_date_added`, format="%Y-%m-%dT%H:%M:%S"),
`_raw_source` = as.character(`_raw_source`),
`_algorithm` = as.character(`_algorithm`)
)
# Print cleaned data for inspection
glimpse(cleaned_nodes)Rows: 60,520
Columns: 15
$ id <chr> "Abbott, Mcbride and Edwards", "Abbott-Gomez", "Ab…
$ type <chr> "Entity.Organization.Company", "Entity.Organizatio…
$ country <chr> "Uziland", "Mawalara", "Uzifrica", "Islavaragon", …
$ HeadOfOrg <chr> "Émilie-Susan Benoit", "Honoré Lemoine", "Jules La…
$ revenue <dbl> 5994.73, 71766.67, 0.00, 0.00, 4746.67, 46566.67, …
$ ProductServices <chr> "Unknown", "Furniture and home accessories", "Food…
$ PointOfContact <chr> "Rebecca Lewis", "Michael Lopez", "Steven Robertso…
$ founding_date <dttm> 1954-04-24, 2009-06-12, 2029-12-15, 1972-02-16, 1…
$ TradeDescription <chr> "Unknown", "Abbott-Gomez is a leading manufacturer…
$ dob <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `_last_edited_by` <chr> "Pelagia Alethea Mordoch", "Pelagia Alethea Mordoc…
$ `_last_edited_date` <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2…
$ `_date_added` <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2…
$ `_raw_source` <chr> "Existing Corporate Structure Data", "Existing Cor…
$ `_algorithm` <chr> "Automatic Import", "Automatic Import", "Automatic…
glimpse(cleaned_links)Rows: 75,817
Columns: 11
$ key <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ source <chr> "Avery Inc", "Berger-Hayes", "Bowers Group", "Bowm…
$ target <chr> "Allen, Nichols and Thompson", "Jensen, Morris and…
$ type <chr> "Event.Owns.Shareholdership", "Event.Owns.Sharehol…
$ start_date <dttm> 2016-10-29, 2035-06-03, 2028-11-20, 2024-09-04, 2…
$ end_date <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `_last_edited_by` <chr> "Pelagia Alethea Mordoch", "Niklaus Oberon", "Pela…
$ `_last_edited_date` <dttm> 2035-01-01, 2035-07-15, 2035-01-01, 2035-01-01, 2…
$ `_date_added` <dttm> 2035-01-01, 2035-07-15, 2035-01-01, 2035-01-01, 2…
$ `_raw_source` <chr> "Existing Corporate Structure Data", "Oceanus Corp…
$ `_algorithm` <chr> "Automatic Import", "Manual Entry", "Automatic Imp…
Changing field name
cleaned_nodes <- cleaned_nodes %>%
rename("last_edited_by" = "_last_edited_by",
"date_added" = "_date_added",
"last_edited_date" = "_last_edited_date",
"raw_source" = "_raw_source",
"algorithm" = "_algorithm")
cleaned_links<- cleaned_links %>%
rename("last_edited_by" = "_last_edited_by",
"date_added" = "_date_added",
"last_edited_date" = "_last_edited_date",
"raw_source" = "_raw_source",
"algorithm" = "_algorithm") Split ‘type’ column into separate columns
We are going to tidy the type column by creating two columns “entity2,entity3”.
word_list1 <- strsplit(cleaned_nodes$type, "\\.")
max_elements1 <- max(lengths(word_list1))
word_list_padded1 <- lapply(word_list1,
function(x) c(x, rep(NA, max_elements1 - length(x))))
word_df1 <- do.call(rbind, word_list_padded1)
colnames(word_df1) <- paste0("entity", 1:max_elements1)
word_df1 <- as_tibble(word_df1) %>%
select(entity2, entity3)
class(word_df1)[1] "tbl_df" "tbl" "data.frame"
The steps below will be used to split text in type column into two columns
word_list <- strsplit(cleaned_links$type, "\\.")
max_elements <- max(lengths(word_list))
word_list_padded <- lapply(word_list,
function(x) c(x, rep(NA, max_elements - length(x))))
word_df <- do.call(rbind, word_list_padded)
colnames(word_df) <- paste0("event", 1:max_elements)
word_df <- as_tibble(word_df) %>%
select(event2, event3)
class(word_df)[1] "tbl_df" "tbl" "data.frame"
Since the output above is a matrix, the code chunk above is used to convert word_df into a tibble data.frame.
cleaned_nodes <- cleaned_nodes %>%
cbind(word_df1)cleaned_links <- cleaned_links %>%
cbind(word_df)The code chunk above appends the extracted columns back to edges tibble data.frame.
write_rds(cleaned_nodes, "data/rds/cleaned_nodes.rds")
write_rds(cleaned_links, "data/rds/cleaned_links.rds")above code write into R rds file format.
Part 1: Data Wrangling
Split the nodes into people and companies, and filter ownership-related edges
# Split the nodes into people and companies
nodes_people <- cleaned_nodes %>% filter(entity2 == "Person")
nodes_company <- cleaned_nodes %>% filter(entity2 == "Organization")# Filter the links to include only ownership-related edges
links_owns <- cleaned_links %>% filter(event2 == "Owns")nodes_people <- nodes_people %>%
rowwise() %>%
mutate('no_owns' = sum(links_owns$source == id))
nodes_people$no_owns <- as.numeric(nodes_people$no_owns)# Calculate the unique counts of 'no_owns' and their corresponding counts and percentages
owns_summary <- nodes_people %>%
group_by(no_owns) %>%
summarise(count = n()) %>%
mutate(percentage = (count / sum(count)) * 100)
# Display the summary
print(owns_summary)# A tibble: 19 × 3
no_owns count percentage
<dbl> <int> <dbl>
1 0 147 0.285
2 1 46370 89.8
3 2 4032 7.81
4 3 665 1.29
5 4 245 0.474
6 5 80 0.155
7 6 34 0.0658
8 7 21 0.0407
9 8 11 0.0213
10 9 7 0.0136
11 10 2 0.00387
12 11 4 0.00774
13 12 3 0.00581
14 13 2 0.00387
15 15 1 0.00194
16 18 2 0.00387
17 29 1 0.00194
18 91 18 0.0349
19 92 4 0.00774
To define and identify influential people based on an ownership threshold. It filters the nodes to keep only those with a significant number of ownerships
# Define the threshold for 'influential'
vip_threshold <- 91
# Filter to keep only influential people and select relevant columns
vip <- nodes_people %>%
filter(no_owns >= vip_threshold) %>%
select(id, country, dob, last_edited_date, date_added, no_owns)
# Display the updated vip data frame
glimpse(vip)Rows: 22
Columns: 6
Rowwise:
$ id <chr> "Kelsey Ortega", "Joseph Gentry", "Cynthia Anderson",…
$ country <chr> "n.a.", "n.a.", "n.a.", "n.a.", "n.a.", "n.a.", "n.a.…
$ dob <dttm> 1974-11-26, 1980-11-08, 1991-07-23, 2013-10-03, 1981…
$ last_edited_date <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2035…
$ date_added <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2035…
$ no_owns <dbl> 91, 91, 91, 92, 91, 92, 91, 91, 91, 91, 91, 91, 91, 9…
filter the ownership connections to include only those involving these influential individuals
# Filter links_owns to keep only those connections where the source is in the vip list
vip_connections <- links_owns %>%
filter(source %in% vip$id)%>%
select(source, target,start_date,end_date,last_edited_date, date_added)
# Display the updated vip_connections data frame
glimpse(vip_connections)Rows: 2,006
Columns: 6
$ source <chr> "Kelsey Ortega", "Kelsey Ortega", "Kelsey Ortega", "K…
$ target <chr> "Mitchell-Glover", "Anderson, Smith and Weber", "Orr …
$ start_date <dttm> 2017-08-11, 2028-12-13, 2016-09-18, 2034-12-16, 2032…
$ end_date <dttm> NA, NA, NA, NA, 2035-07-13, NA, NA, NA, NA, NA, NA, …
$ last_edited_date <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2035…
$ date_added <dttm> 2035-01-01, 2035-01-01, 2035-01-01, 2035-01-01, 2035…
Part 1: Network Graph
Finally, plot the network, highlighting the structure and connections of influential individuals.
# Create graph from VIP connections
g_vip <- graph_from_data_frame(d = vip_connections, directed = TRUE)
# Identify VIPs and Companies
V(g_vip)$type <- ifelse(V(g_vip)$name %in% nodes_people$id, "VIP", "Company")
# Define colors and sizes
V(g_vip)$color <- ifelse(V(g_vip)$type == "VIP", "blue", "orange")
V(g_vip)$size <- ifelse(V(g_vip)$type == "VIP", 8, 5)
# Plot the network
plot(g_vip, vertex.label = NA, vertex.size = V(g_vip)$size, edge.arrow.size = 0.5,
vertex.color = V(g_vip)$color, main = "VIP Connections Network")
The plot represents the VIP Connections Network, with blue nodes indicating influential VIPs and orange nodes representing companies they own. Directed edges illustrate ownership, pointing from VIPs to companies. This visualization highlights the dense centrality of VIPs, showcasing their extensive control across multiple companies. By examining these connections, we can infer the structure and extent of VIP influence within the network and help FishEye identify influential individuals within the business network, highlighting ownership structures and central figures. By tracking ownership changes over time, FishEye can pinpoint who controls companies involved in illegal fishing activities.
While this plot provides a static snapshot, in the following we shall create similar plots for different time periods can reveal changes in ownership and influence over time.
Part 2: Temporal Analysis
Aggregate Ownership Changes by Year
change_over_time1 <- links_owns %>%
group_by(start_date) %>%
summarize(count = n()) %>%
drop_na()
links_owns<- links_owns %>%
mutate(start_year = format(start_date, "%Y"))
# Aggregate ownership changes by year
change_over_time <- links_owns %>%
group_by(start_year) %>%
summarize(count = n()) %>%
drop_na()Create plots to visualize the changes in ownership over time.
# Plot changes over time
ggplot(change_over_time, aes(x = as.numeric(start_year), y = count)) +
geom_line() +
geom_point() +
labs(title = "Changes in Ownership Over Time",
x = "Year",
y = "Number of Ownership Changes") +
theme_minimal()
Part 2: Network Graph by Year
Given the significant increase in data from 2004 onwards, focusing on every 10 years from 2005 to 2035 would provide a more detailed analysis of changes in ownership and influence.
# Specify the year
filter_year <- 2005
# Filter vip_connections by start_year
vip_connections_filtered <- vip_connections %>%
filter(format(start_date, "%Y") == filter_year)
# Create the graph object from the filtered vip_connections
g_vip_filtered <- graph_from_data_frame(d = vip_connections_filtered, directed = TRUE)
# Identify VIPs (nodes_people) and Companies
V(g_vip_filtered)$type <- ifelse(V(g_vip_filtered)$name %in% nodes_people$id, "VIP", "Company")
# Define colors and sizes
V(g_vip_filtered)$color <- ifelse(V(g_vip_filtered)$type == "VIP", "blue", "orange")
V(g_vip_filtered)$size <- ifelse(V(g_vip_filtered)$type == "VIP", 8, 5)
# Plot the network
p2005<-plot(g_vip_filtered, vertex.label = NA, vertex.size = V(g_vip_filtered)$size, edge.arrow.size = 0.5,
vertex.color = V(g_vip_filtered)$color, main = paste("VIP Connections Network for", filter_year))
In 2005, the network shows a relatively sparse structure with a moderate number of connections. VIPs (blue nodes) are moderately interconnected, indicating a balanced distribution of influence among several key players.
# Specify the year
filter_year <- 2015
# Filter vip_connections by start_year
vip_connections_filtered_2015 <- vip_connections %>%
filter(format(start_date, "%Y") == filter_year)
# Create the graph object from the filtered vip_connections
g_vip_filtered_2015 <- graph_from_data_frame(d = vip_connections_filtered_2015, directed = TRUE)
# Identify VIPs (nodes_people) and Companies
V(g_vip_filtered_2015)$type <- ifelse(V(g_vip_filtered_2015)$name %in% nodes_people$id, "VIP", "Company")
# Define colors and sizes
V(g_vip_filtered_2015)$color <- ifelse(V(g_vip_filtered_2015)$type == "VIP", "blue", "orange")
V(g_vip_filtered_2015)$size <- ifelse(V(g_vip_filtered_2015)$type == "VIP", 8, 5)
# Plot the network
p2015 <- plot(g_vip_filtered_2015, vertex.label = NA, vertex.size = V(g_vip_filtered_2015)$size, edge.arrow.size = 0.5,
vertex.color = V(g_vip_filtered_2015)$color, main = paste("VIP Connections Network for", filter_year))
Question 4
For part 1, the focus was on identifying the network associated with SouthSeafood Express Corp and visualizing how this network and competing businesses changed as a result of their illegal fishing behavior.
Part 1: Identify SouthSeafood Express Corp Node
Locate the node representing SouthSeafood Express Corp in the network.
Create a visualization of the network associated with SouthSeafood Express Corp before any changes.
# Extract edges connected to SouthSeafood Express Corp
southseafood_edges <- cleaned_links %>%
filter(source == "SouthSeafood Express Corp" | target == "SouthSeafood Express Corp")%>%
select(source,target,start_date,end_date,event2)
# Ensure all nodes in the edge list are present in the vertex data frame
southseafood_nodes <- cleaned_nodes %>%
filter(id %in% (c(southseafood_edges$source, southseafood_edges$target)))
# Join edges with nodes to ensure all nodes are present
southseafood_edges <- southseafood_edges %>%
filter(source %in% southseafood_nodes$id & target %in% southseafood_nodes$id)
# Create graph object for the sub-network
g_southseafood <- graph_from_data_frame(d = southseafood_edges, vertices = southseafood_nodes, directed = TRUE)
# Visualize the initial network
plot(g_southseafood, vertex.label = NA, vertex.size = 5, edge.arrow.size = 0.5,
vertex.color = "orange", main = "Network Associated with SouthSeafood Express Corp")
Part 1: Identify Competing Businesses
Identify and highlight competing businesses within the extracted sub-network.
competing_businesses <- cleaned_nodes %>%
filter(entity3 == "FishingCompany" & id != "SouthSeafood Express Corp")competing_edges <- cleaned_links %>%
filter(source %in% competing_businesses$id | target %in% competing_businesses$id) %>%
select(source, target, start_date, end_date, event2)
# Combine SouthSeafood Express Corp edges with competing businesses edges
combined_edges <- bind_rows(southseafood_edges, competing_edges)
# Extract the combined set of nodes
combined_nodes <- cleaned_nodes %>%
filter(id %in% c(combined_edges$source, combined_edges$target))# Create graph object for the combined network
g_combined <- graph_from_data_frame(d = combined_edges, vertices = combined_nodes, directed = TRUE)Part 1: Analyze Temporal Changes based on start_year
Filter the data to show the network before and after the illegal fishing incident(assume the incident happened in 2023)
Create visualizations to compare the network structure and connections before and after the incident.
# Assume the accident happened in 2023
incident_year <- 2023
# Filter edges before the incident
edges_before <- combined_edges %>%
filter(format(start_date, "%Y") < incident_year)
# Filter edges after the incident
edges_after <- combined_edges %>%
filter(format(start_date, "%Y") >= incident_year)
# Create graph objects for before and after the incident
g_before <- graph_from_data_frame(d = edges_before, vertices = combined_nodes, directed = TRUE)
g_after <- graph_from_data_frame(d = edges_after, vertices = combined_nodes, directed = TRUE)Part 1: Visualize the Temporal Changes
Identify and highlight significant changes in connections and structure due to the illegal fishing behavior and subsequent closure.
par(mfrow = c(2, 1))
plot_before <- ggraph(g_before, layout = "fr") +
geom_edge_link(aes(edge_alpha = 0.8), show.legend = FALSE, color = "gray", width = 1) +
geom_node_point(aes(color = ifelse(name == "SouthSeafood Express Corp", "SouthSeafood",
ifelse(type == "Entity.Organization.FishingCompany", "FishingCompany", "Other"))),
size = 3, alpha = 0.6, show.legend = TRUE) + # Adjusted alpha for transparency
scale_color_manual(values = c("SouthSeafood" = "red", "FishingCompany" = "blue", "Other" = "orange"),
name = "Type") + # Shortened legend title
theme_void() +
theme(legend.position = "bottom") +
labs(title = "Network Before Incident")
# Show the plot for the network before the incident
plot_before
plot_after <- ggraph(g_after, layout = "fr") +
geom_edge_link(aes(edge_alpha = 0.8), show.legend = FALSE, color = "gray", width = 1) +
geom_node_point(aes(color = ifelse(name == "SouthSeafood Express Corp", "SouthSeafood",
ifelse(type == "Entity.Organization.FishingCompany", "FishingCompany", "Other"))),
size = 3, alpha = 0.6, show.legend = TRUE) + # Adjusted alpha for transparency
scale_color_manual(values = c("SouthSeafood" = "red", "FishingCompany" = "blue", "Other" = "orange"),
name = "Type") + # Shortened legend title
theme_void() +
theme(legend.position = "bottom") +
labs(title = "Network After Incident")
# Show the plot for the network after the incident
plot_after
Observations:
The number of blue nodes (fishing companies) appears to have decreased.
SouthSeafood Express Corp (red node) remains central but its connections might have changed, indicating possible impact from the incident.
For part 2, since we cannot use revenue data over time, we will focus on identifying which companies potentially benefited from SouthSeafood Express Corp’s legal troubles by analyzing changes in network centrality measures.
Part 2: Calculate Centrality Measures Before and After the Incident
# Calculate degree centrality before the incident
degree_before <- degree(g_before, mode = "all")
# Calculate degree centrality after the incident
degree_after <- degree(g_after, mode = "all")
# Combine degree centrality measures into a data frame
centrality_change <- data.frame(
id = names(degree_before),
degree_before = degree_before,
degree_after = degree_after
)
# Calculate the change in degree centrality
centrality_change <- centrality_change %>%
mutate(change = degree_after - degree_before)
# Display companies with the most positive change in degree centrality
top_beneficiaries <- centrality_change %>%
arrange(desc(change)) %>%
head(10)
print(top_beneficiaries) id degree_before degree_after change
Anderson-Roberts Anderson-Roberts 0 36 36
Hall, Hartman and Hall Hall, Hartman and Hall 0 30 30
Kirk Inc Kirk Inc 0 18 18
Watson-Gray Watson-Gray 0 18 18
Parker Inc Parker Inc 0 17 17
Mullins-Carrillo Mullins-Carrillo 0 15 15
Torres, Ross and Brown Torres, Ross and Brown 0 14 14
Byrd and Sons Byrd and Sons 0 13 13
Haynes-Lucero Haynes-Lucero 0 13 13
Lutz-Fleming Lutz-Fleming 0 13 13
Part 2: Determine Entity Type
# Merge with cleaned_nodes to get the entity type
top_beneficiaries_info <- top_beneficiaries %>%
left_join(cleaned_nodes, by = c("id" = "id")) %>%
select(id, change,entity3)
# Display the entity type of top beneficiaries
print(top_beneficiaries_info) id change entity3
1 Anderson-Roberts 36 FishingCompany
2 Hall, Hartman and Hall 30 FishingCompany
3 Kirk Inc 18 FishingCompany
4 Watson-Gray 18 FishingCompany
5 Parker Inc 17 FishingCompany
6 Mullins-Carrillo 15 FishingCompany
7 Torres, Ross and Brown 14 FishingCompany
8 Byrd and Sons 13 FishingCompany
9 Haynes-Lucero 13 FishingCompany
10 Lutz-Fleming 13 FishingCompany
Part 2: Visualize the Changes
# Bar plot of top beneficiaries
ggplot(top_beneficiaries_info, aes(x = reorder(id, change), y = change)) +
geom_bar(stat = "identity", fill = "lightblue") +
coord_flip() +
theme_minimal() +
labs(title = "Top Beneficiaries by Change in Degree Centrality",
x = "Company",
y = "Change in Degree Centrality",
fill = "Entity Type") +
theme(legend.position = "none")
The results show that the top beneficiaries, all classified as fishing companies, significantly increased their network centrality following SouthSeafood Express Corp’s legal troubles. Anderson-Roberts, Hall, Hartman and Hall, and Kirk Inc., among others, saw the largest gains, suggesting they capitalized on the shift in the network’s structure.
By 2015, the network has grown denser, suggesting increased interconnectedness and influence consolidation. More VIPs are connected to multiple companies (orange nodes), indicating a significant rise in their influence and control over the network.
# Specify the year
filter_year <- 2025
# Filter vip_connections by start_year
vip_connections_filtered_2025 <- vip_connections %>%
filter(format(start_date, "%Y") == filter_year)
# Create the graph object from the filtered vip_connections
g_vip_filtered_2025 <- graph_from_data_frame(d = vip_connections_filtered_2025, directed = TRUE)
# Identify VIPs (nodes_people) and Companies
V(g_vip_filtered_2025)$type <- ifelse(V(g_vip_filtered_2025)$name %in% nodes_people$id, "VIP", "Company")
# Define colors and sizes
V(g_vip_filtered_2025)$color <- ifelse(V(g_vip_filtered_2025)$type == "VIP", "blue", "orange")
V(g_vip_filtered_2025)$size <- ifelse(V(g_vip_filtered_2025)$type == "VIP", 8, 5)
# Plot the network
p2025 <- plot(g_vip_filtered_2025, vertex.label = NA, vertex.size = V(g_vip_filtered_2025)$size, edge.arrow.size = 0.5,
vertex.color = V(g_vip_filtered_2025)$color, main = paste("VIP Connections Network for", filter_year))
The network continues to expand in 2025, displaying even more complexity and interconnections. This period likely represents a peak in influence for several VIPs, with many of them owning shares in numerous companies, suggesting increased market control.
# Specify the year
filter_year <- 2035
# Filter vip_connections by start_year
vip_connections_filtered_2035 <- vip_connections %>%
filter(format(start_date, "%Y") == filter_year)
# Create the graph object from the filtered vip_connections
g_vip_filtered_2035 <- graph_from_data_frame(d = vip_connections_filtered_2035, directed = TRUE)
# Identify VIPs (nodes_people) and Companies
V(g_vip_filtered_2035)$type <- ifelse(V(g_vip_filtered_2035)$name %in% nodes_people$id, "VIP", "Company")
# Define colors and sizes
V(g_vip_filtered_2035)$color <- ifelse(V(g_vip_filtered_2035)$type == "VIP", "blue", "orange")
V(g_vip_filtered_2035)$size <- ifelse(V(g_vip_filtered_2035)$type == "VIP", 8, 5)
# Plot the network
p2035 <- plot(g_vip_filtered_2035, vertex.label = NA, vertex.size = V(g_vip_filtered_2035)$size, edge.arrow.size = 0.5,
vertex.color = V(g_vip_filtered_2035)$color, main = paste("VIP Connections Network for", filter_year))
In 2035, the network structure shifts to a star-like formation, where a central VIP appears to have gained substantial influence, with direct connections to numerous companies. This indicates a significant consolidation of power and influence, where a few key players dominate the network.
Initially, influence is distributed among several key players, but over the years, it becomes concentrated among fewer individuals, leading to a highly centralized network by 2035. This centralization of power can be both an opportunity for streamlined decision-making and a risk for monopolistic control. Monitoring these changes is crucial for regulatory bodies like FishEye to ensure fair practices and prevent illegal activities within the network.